Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training

Penke, Carolin

Items
Marc 21

001			1034068
005			20250203103429.0
024	7	_	\|a 10.34734/FZJ-2024-06889 \|2 datacite_doi
037	_	_	\|a FZJ-2024-06889
100	1	_	\|a Penke, Carolin \|0 P:(DE-Juel1)192254 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a LoRAINNe’24: workshop on LOw-Rank Approximations and their Interactions with Neural NEtworks \|g LoRAINNe’24 \|c Nancy \|d 2024-11-26 - 2024-11-27 \|w France
245	_	_	\|a Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in LLM Training
260	_	_	\|c 2024
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a Other \|2 DataCite
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a LECTURE_SPEECH \|2 ORCID
336	7	_	\|a Conference Presentation \|b conf \|m conf \|0 PUB:(DE-HGF)6 \|s 1736500131_6151 \|2 PUB:(DE-HGF) \|x Invited
520	_	_	\|a The OpenGPT-X project represents one of Europe’s pioneering publicly funded efforts in the domain of large language models (LLMs), covering the entire lifecycle from pre-training foundational models to fine-tuning and practical application development. To maximize the efficiency of training on High Performance Computing (HPC) resources, strategies aimed at reducing computational and memory demands are being explored. A promising avenue exploits the low-rank structure of gradients, as done in the LoRA or GaLore frameworks, the latter of which relies on the computation of dominant low-rank subspaces during training. The randomized range finder algorithm provides a more efficient alternative to computing a full singular value decomposition (SVD). We introduce a novel variant of the range finder, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators.
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a OpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F) \|0 G:(DE-Juel-1)68GX21007F \|c 68GX21007F \|x 1
856	4	_	\|u https://juser.fz-juelich.de/record/1034068/files/Penke_LowRankRepresentationsToReduceMemoryInLLMs.pdf \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:1034068 \|p openaire \|p open_access \|p VDB \|p driver
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)192254
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2024
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	1	_	\|a FullTexts
980	_	_	\|a conf
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help