Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning

Penke, Carolin

Items
Marc 21

001			1034062
005			20250203103147.0
024	7	_	\|a 10.34734/FZJ-2024-06883 \|2 datacite_doi
037	_	_	\|a FZJ-2024-06883
100	1	_	\|a Penke, Carolin \|0 P:(DE-Juel1)192254 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a RWTH Aachen SFB 1481 Colloquium \|c Aachen \|d 2024-12-11 - 2024-12-11 \|w Germany
245	_	_	\|a Efficient Computation of Low-Rank Representations to Reduce Memory Requirements in Deep Learning \|f 2024-09-11 -
260	_	_	\|c 2024
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a Other \|2 DataCite
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a LECTURE_SPEECH \|2 ORCID
336	7	_	\|a Talk (non-conference) \|b talk \|m talk \|0 PUB:(DE-HGF)31 \|s 1736497335_6155 \|2 PUB:(DE-HGF) \|x Invited
336	7	_	\|a Other \|2 DINI
520	_	_	\|a Computing an orthogonal basis that approximates the range or corange of a matrix is a ubiquitous problem in computational science and engineering. In numerous applications, a rapid decay of singular values permits the use of such bases to approximate a linear operator by restricting it to low-rank subspaces, thereby significantly reducing computational and storage demands. A powerful approach for constructing a basis with a specified rank or approximation tolerance is the (adaptive) randomized range finder. In this talk, we introduce a novel variant of this algorithm, based on the blocked Householder QR decomposition, optimized for modern GPU accelerators. This development is motivated by its potential to substantially lower memory requirements during the training of deep neural networks such as transformers. We discuss the GaLore (Gradient Low-Rank Projection) training framework, and demonstrate how the randomized range finder can be employed to derive low-rank representations of optimizer states. Further potential avenues for future research are discussed.
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a OpenGPT-X - Aufbau eines Gaia-X Knotens für große KI-Sprachmodelle und innovative Sprachapplikations-Services; Teilvorhaben: Optimierung und Skalierung auf großen HPC-Systemen (68GX21007F) \|0 G:(DE-Juel-1)68GX21007F \|c 68GX21007F \|x 1
856	4	_	\|u https://juser.fz-juelich.de/record/1034062/files/LowRankRepresentationsDL.pdf \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:1034062 \|p openaire \|p open_access \|p VDB \|p driver
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)192254
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2024
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	1	_	\|a FullTexts
980	_	_	\|a talk
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help