Home > Publications database > Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training |
Conference Presentation (Invited) | FZJ-2024-06889 |
2024
This record in other databases:
Please use a persistent id in citations: doi:10.34734/FZJ-2024-06889
Abstract: The OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
![]() |
The record appears in these collections: |